Comparing Information-Theoretic Attribute Selection Measures: A Statistical Approach

نویسندگان

  • Ramon López de Mántaras
  • Jesús Cerquides
  • Pere Garcia-Calvés
چکیده

In a previous paper (Lopez de Mantaras, 1991), we introduced a new information theoretic attribute selection method for decision tree induction. This method consists in computing for each node, a distance between the partition generated by the values of each candidate attribute in the node and the correct partition of the subset of training examples in this node. The chosen attribute is that whose corresponding partition is the closest to the correct partition (i.e. the partition that perfectly classifies the training data). In the paper we had also formally proved that such distance is not biased towards attributes with a large number of values in the sense specified by Quinlan in (Quinlan, 1986) and we had also some initial experimental evidence that the predictive accuracy of the induced trees was not significantly different from that obtained with the most widely used information theoretic attribute selection measures, that is, Quinlan ́s Gain and Quinlan ́s Gain Ratio. However, it seemed that the distance induced smaller trees especially when the attributes had different number of values. In that paper we could not confirm that the differences were statistically significant due to the small number of experiments we had performed. Now in this paper we report experimental results that allow us to confirm that the distance induces trees whose size, without losing accuracy, is not significantly different from those obtained using Quinlan's Gain but smaller than those obtained with Quinlan's Gain Ratio. These experimental results are supported by a statistical analysis performed using two statistical hypothesis tests : the sign test and the signed rank test.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine

Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods.  In filter methods, features subsets are selected due to some measu...

متن کامل

Information-Theoretic Measures for Knowledge Discovery and Data Mining

A database may be considered as a statistical population, and an attribute as a statistical variable taking values from its domain. One can carry out statistical and information-theoretic analysis on a database. Based on the attribute values, a database can be partitioned into smaller populations. An attribute is deemed important if it partitions the database such that previously unknown regula...

متن کامل

Multiple attribute decision making with triangular intuitionistic fuzzy numbers based on zero-sum game approach

For many decision problems with uncertainty, triangular intuitionistic fuzzy number is a useful tool in expressing ill-known quantities. This paper develops a novel decision method based on zero-sum game for multiple attribute decision making problems where the attribute values take the form of triangular intuitionistic fuzzy numbers and the attribute weights are unknown. First, a new value ind...

متن کامل

Attribute Selection for the Scheduling of Flexible Manufacturing Systems Based on Fuzzy Set-theoretic Approach and Genetic Algorithm

Assigning proper dispatching rules dynamically has been shown to enhance various performance measures for a flexible manufacturing system (FMS). To achieve this, real-time salient information of the system is extracted and then a rule’s dispatching mechanism is built for the scheduling task. For a dynamic scheduled FMS, two critical issues dominate the performance; the first is the selection of...

متن کامل

A DEA-bases Approach for Multi-objective Design of Attribute Acceptance Sampling Plans

Acceptance sampling (AS), as one of the main fields of statistical quality control (SQC),involves a system of principles and methods to make decisions about accepting or rejecting alot or sample. For attributes, the design of a single AS plan generally requires determination ofsample size, and acceptance number. Numerous approaches have been developed foroptimally selection of design parameters...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • AI Commun.

دوره 11  شماره 

صفحات  -

تاریخ انتشار 1998